BIOSTATISTICS
. . . wait. What!?
Erik Kusch
erik.kusch@au.dk
Section for Ecoinformatics & Biodiversity
Center for Biodiversity and Dynamics in a Changing World (BIOCHANGE)
Aarhus University
28/10/2020
Aarhus University Biostatistics - Why? What? How? 1 / 15
1 Should you care?
2 Biological Terminology
3 Issues
4 Me
Aarhus University Biostatistics - Why? What? How? 2 / 15
Should you care?
1 Should you care?
2 Biological Terminology
3 Issues
4 Me
Aarhus University Biostatistics - Why? What? How? 3 / 15
Should you care?
The Big Question
Should you care about biostatistics?
Thank you for attending my TED talk.
Aarhus University Biostatistics - Why? What? How? 4 / 15
Should you care?
The Big Question
YES!
Thank you for attending my TED talk.
Aarhus University Biostatistics - Why? What? How? 4 / 15
Should you care?
The Big Question
YES!
Thank you for attending my TED talk.
Aarhus University Biostatistics - Why? What? How? 4 / 15
Biological Terminology
1 Should you care?
2 Biological Terminology
3 Issues
4 Me
Aarhus University Biostatistics - Why? What? How? 5 / 15
Biological Terminology
Biological Terminology
No, biostatistics are not just for math nerds.
Statisticians don’t know important
biological background:
Population vs. Sample
Species, Family, Taxon, etc.
Interpretation of results
Biologists don’t know important
statistical background:
Unsupervised vs. Supervised
Approaches
Statistical Assumptions
Parametric vs. Non-Parametric
Tests
Aarhus University Biostatistics - Why? What? How? 6 / 15
Biological Terminology
Biological Terminology
No, biostatistics are not just for math nerds.
Statisticians don’t know important
biological background:
Population vs. Sample
Species, Family, Taxon, etc.
Interpretation of results
Biologists don’t know important
statistical background:
Unsupervised vs. Supervised
Approaches
Statistical Assumptions
Parametric vs. Non-Parametric
Tests
Aarhus University Biostatistics - Why? What? How? 6 / 15
Biological Terminology
Biological Terminology
No, biostatistics are not just for math nerds.
Statisticians don’t know important
biological background:
Population vs. Sample
Species, Family, Taxon, etc.
Interpretation of results
Biologists don’t know important
statistical background:
Unsupervised vs. Supervised
Approaches
Statistical Assumptions
Parametric vs. Non-Parametric
Tests
Aarhus University Biostatistics - Why? What? How? 6 / 15
Biological Terminology
Basic Statistics
How often do you actually check assumptions?
Assumptions:
Normality
Independence
Homogeneity of variances
Testing? Remedies?
Scales and Distributions:
Continuous, Categorical
Nominal, Binary, Ordinal, Interval,
Relation/Ratio, Integer
Gaussian Normal, Binomial, Poisson
Distinguish them?
Aarhus University Biostatistics - Why? What? How? 7 / 15
Biological Terminology
Basic Statistics
How often do you actually check assumptions?
Assumptions:
Normality
Independence
Homogeneity of variances
Testing? Remedies?
Scales and Distributions:
Continuous, Categorical
Nominal, Binary, Ordinal, Interval,
Relation/Ratio, Integer
Gaussian Normal, Binomial, Poisson
Distinguish them?
Aarhus University Biostatistics - Why? What? How? 7 / 15
Biological Terminology
Basic Statistics
How often do you actually check assumptions?
Assumptions:
Normality
Independence
Homogeneity of variances
Testing? Remedies?
Scales and Distributions:
Continuous, Categorical
Nominal, Binary, Ordinal, Interval,
Relation/Ratio, Integer
Gaussian Normal, Binomial, Poisson
Distinguish them?
Aarhus University Biostatistics - Why? What? How? 7 / 15
Biological Terminology
Basic Statistics
How often do you actually check assumptions?
Assumptions:
Normality
Independence
Homogeneity of variances
Testing? Remedies?
Scales and Distributions:
Continuous, Categorical
Nominal, Binary, Ordinal, Interval,
Relation/Ratio, Integer
Gaussian Normal, Binomial, Poisson
Distinguish them?
Aarhus University Biostatistics - Why? What? How? 7 / 15
Biological Terminology
Basic Statistics
How often do you actually check assumptions?
Assumptions:
Normality
Independence
Homogeneity of variances
Testing? Remedies?
Scales and Distributions:
Continuous, Categorical
Nominal, Binary, Ordinal, Interval,
Relation/Ratio, Integer
Gaussian Normal, Binomial, Poisson
Distinguish them?
Aarhus University Biostatistics - Why? What? How? 7 / 15
Biological Terminology
Correlations
Correlation is not necessarily causation.
Correlation tests yield two
measurements:
r value (measure of correlation)
r 1
(strong, positive correlation)
r 0 (no correlation)
r 1 (strong, negative
correlation)
p value (measure of statistical
significance)
Get a feeling for it here http://guessthecorrelation.com/
Aarhus University Biostatistics - Why? What? How? 8 / 15
Biological Terminology
Correlations
Correlation is not necessarily causation.
Correlation tests yield two
measurements:
r value (measure of correlation)
r 1
(strong, positive correlation)
r 0 (no correlation)
r 1 (strong, negative
correlation)
p value (measure of statistical
significance)
Get a feeling for it here http://guessthecorrelation.com/
Aarhus University Biostatistics - Why? What? How? 8 / 15
Biological Terminology
Correlations
Correlation is not necessarily causation.
Correlation tests yield two
measurements:
r value (measure of correlation)
r 1
(strong, positive correlation)
r 0 (no correlation)
r 1 (strong, negative
correlation)
p value (measure of statistical
significance)
Get a feeling for it here http://guessthecorrelation.com/
Aarhus University Biostatistics - Why? What? How? 8 / 15
Biological Terminology
Advanced Statistics
What do you want to analyse and predict?
Classifications:
K-Means
Support-Vector Machines
Hierarchies
Networks
When to use which one?
Regression:
Linear Models
Least Squares vs. Maximum Likelihood
Mixed Effect Models
GLS/GLM, and GAM
How do you select the best model?
Aarhus University Biostatistics - Why? What? How? 9 / 15
Biological Terminology
Advanced Statistics
What do you want to analyse and predict?
Classifications:
K-Means
Support-Vector Machines
Hierarchies
Networks
When to use which one?
Regression:
Linear Models
Least Squares vs. Maximum Likelihood
Mixed Effect Models
GLS/GLM, and GAM
How do you select the best model?
Aarhus University Biostatistics - Why? What? How? 9 / 15
Biological Terminology
Advanced Statistics
What do you want to analyse and predict?
Classifications:
K-Means
Support-Vector Machines
Hierarchies
Networks
When to use which one?
Regression:
Linear Models
Least Squares vs. Maximum Likelihood
Mixed Effect Models
GLS/GLM, and GAM
How do you select the best model?
Aarhus University Biostatistics - Why? What? How? 9 / 15
Biological Terminology
Advanced Statistics
What do you want to analyse and predict?
Classifications:
K-Means
Support-Vector Machines
Hierarchies
Networks
When to use which one?
Regression:
Linear Models
Least Squares vs. Maximum Likelihood
Mixed Effect Models
GLS/GLM, and GAM
How do you select the best model?
Aarhus University Biostatistics - Why? What? How? 9 / 15
Biological Terminology
Advanced Statistics
What do you want to analyse and predict?
Classifications:
K-Means
Support-Vector Machines
Hierarchies
Networks
When to use which one?
Regression:
Linear Models
Least Squares vs. Maximum Likelihood
Mixed Effect Models
GLS/GLM, and GAM
How do you select the best model?
Aarhus University Biostatistics - Why? What? How? 9 / 15
Issues
1 Should you care?
2 Biological Terminology
3 Issues
4 Me
Aarhus University Biostatistics - Why? What? How? 10 / 15
Issues
Statistical Significance - the p-value
Misconceptions
The p-value is not designed to tell us
whether something is strictly true or false
It is not the probability of the null
hypothesis being true
The size of p 6= strength of an observed
effect
Alternatives
Effect Sizes
Confidence Intervals
Akaike Information Criterion (AIC)
Bayes Factor
Credible Intervals
Aarhus University Biostatistics - Why? What? How? 11 / 15
Issues
Statistical Significance - the p-value
Misconceptions
The p-value is not designed to tell us
whether something is strictly true or false
It is not the probability of the null
hypothesis being true
The size of p 6= strength of an observed
effect
Alternatives
Effect Sizes
Confidence Intervals
Akaike Information Criterion (AIC)
Bayes Factor
Credible Intervals
Aarhus University Biostatistics - Why? What? How? 11 / 15
Issues
Statistical Significance - the p-value
Misconceptions
The p-value is not designed to tell us
whether something is strictly true or false
It is not the probability of the null
hypothesis being true
The size of p 6= strength of an observed
effect
Alternatives
Effect Sizes
Confidence Intervals
Akaike Information Criterion (AIC)
Bayes Factor
Credible Intervals
Aarhus University Biostatistics - Why? What? How? 11 / 15
Issues
Coding Etiquette
R Coding
- Object Modes
- Object Types
- Sub-setting
- Vectorisation
- Statements, Loops
- Functions, Packages
Coding Schools
- Hard-coding vs. Soft-coding
- Base plot vs. ggplot2
- Base code vs. tidyverse
And what about Git Hub?
Aarhus University Biostatistics - Why? What? How? 12 / 15
Issues
Coding Etiquette
R Coding
- Object Modes
- Object Types
- Sub-setting
- Vectorisation
- Statements, Loops
- Functions, Packages
Coding Schools
- Hard-coding vs. Soft-coding
- Base plot vs. ggplot2
- Base code vs. tidyverse
And what about Git Hub?
Aarhus University Biostatistics - Why? What? How? 12 / 15
Issues
Coding Etiquette
R Coding
- Object Modes
- Object Types
- Sub-setting
- Vectorisation
- Statements, Loops
- Functions, Packages
Coding Schools
- Hard-coding vs. Soft-coding
- Base plot vs. ggplot2
- Base code vs. tidyverse
And what about Git Hub?
Aarhus University Biostatistics - Why? What? How? 12 / 15
Issues
Coding Etiquette
R Coding
- Object Modes
- Object Types
- Sub-setting
- Vectorisation
- Statements, Loops
- Functions, Packages
Coding Schools
- Hard-coding vs. Soft-coding
- Base plot vs. ggplot2
- Base code vs. tidyverse
And what about Git Hub?
Aarhus University Biostatistics - Why? What? How? 12 / 15
Issues
Manuscript Workflow
Using Rmarkdown for your research comes with a multitude of advantages:
1 Entire workflow in one program (RStudio)
2 Research and reports reproducible at the click of one button
3 Combines R functionality and L
A
T
E
X formatting (if desired)
4 Consistent formatting
5 Clear presentation of code
6 Dynamic documents (you can generate various output document types)
7 Applicable for almost all document types you may desire as an output
(e.g. manuscripts, presentations, posters, etc.)
Aarhus University Biostatistics - Why? What? How? 13 / 15
Me
1 Should you care?
2 Biological Terminology
3 Issues
4 Me
Aarhus University Biostatistics - Why? What? How? 14 / 15
Me
Need Statistical Advice?
Find me in room 318, building 1540 (Fridays, 09.00-12.00) or via erik.kusch@bio.au.dk.
Aarhus University Biostatistics - Why? What? How? 15 / 15
Me
Need Statistical Advice?
Find me in room 318, building 1540 (Fridays, 09.00-12.00) or via erik.kusch@bio.au.dk.
Aarhus University Biostatistics - Why? What? How? 15 / 15